The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Generalization is an important attribute of machine learning models, particularly for those that are to be deployed in a medical context, where unreliable predictions can have real world consequences. While the failure of models to generalize across datasets is typically attributed to a mismatch in the data distributions, performance gaps are often a consequence of biases in the 'ground-truth' label annotations. This is particularly important in the context of medical image segmentation of pathological structures (e.g. lesions), where the annotation process is much more subjective, and affected by a number underlying factors, including the annotation protocol, rater education/experience, and clinical aims, among others. In this paper, we show that modeling annotation biases, rather than ignoring them, poses a promising way of accounting for differences in annotation style across datasets. To this end, we propose a generalized conditioning framework to (1) learn and account for different annotation styles across multiple datasets using a single model, (2) identify similar annotation styles across different datasets in order to permit their effective aggregation, and (3) fine-tune a fully trained model to a new annotation style with just a few samples. Next, we present an image-conditioning approach to model annotation styles that correlate with specific image features, potentially enabling detection biases to be more easily identified.
translated by 谷歌翻译
发现预测未来疾病结果的患者特定成像标记可以帮助我们更好地了解疾病进化的个体水平异质性。实际上,可以在医学实践中采用的可以提供数据驱动的个性化标记的深度学习模型。在这项工作中,我们证明了数据驱动的生物标志物发现可以通过反事实综合过程来实现。我们展示了如何使用深层的条件生成模型来扰动基线图像中的局部成像特征,这些图像与特定于受试者的未来疾病进化有关,并导致反事实图像有望具有不同的未来结果。因此,候选生物标志物是由于检查了此过程中受到干扰的一组功能而产生的。通过对大型多扫描仪多中心多发性硬化症(MS)临床试验磁共振成像(MRI)数据集(RRMS)患者数据集(RRMS)患者数据集进行的几项实验,我们证明我们的模型会产生反面的反面事件,并具有成像变化反映了建立的临床标记的特征,可预测人群水平的未来MRI病变活性。其他定性结果表明,我们的模型有可能发现未来活动的新颖和主题的预测标记。
translated by 谷歌翻译
大型,注释的数据集在医学图像分析中不广泛使用,这是由于时间,成本和标记大型数据集相关的挑战。未标记的数据集更容易获取,在许多情况下,专家可以为一小部分图像提供标签是可行的。这项工作提出了一个信息理论的主动学习框架,该框架可以根据评估数据集中最大化预期信息增益(EIG)来指导未标记池的最佳图像选择。实验是在两个不同的医学图像分类数据集上进行的:多类糖尿病性视网膜病变量表分类和多级皮肤病变分类。结果表明,通过调整EIG来说明班级不平衡,我们提出的适应预期信息增益(AEIG)的表现优于几个流行的基线,包括基于多样性的核心和基于不确定性的最大熵抽样。具体而言,AEIG仅占总体表现的95%,只有19%的培训数据,而其他活跃的学习方法则需要约25%。我们表明,通过仔细的设计选择,我们的模型可以集成到现有的深度学习分类器中。
translated by 谷歌翻译
自动生物医学图像分析的领域至关重要地取决于算法验证的可靠和有意义的性能指标。但是,当前的度量使用通常是不明智的,并且不能反映基本的域名。在这里,我们提出了一个全面的框架,该框架指导研究人员以问题意识的方式选择绩效指标。具体而言,我们专注于生物医学图像分析问题,这些问题可以解释为图像,对象或像素级别的分类任务。该框架首先编译域兴趣 - 目标结构 - ,数据集和算法与输出问题相关的属性的属性与问题指纹相关,同时还将其映射到适当的问题类别,即图像级分类,语义分段,实例,实例细分或对象检测。然后,它指导用户选择和应用一组适当的验证指标的过程,同时使他们意识到与个人选择相关的潜在陷阱。在本文中,我们描述了指标重新加载推荐框架的当前状态,目的是从图像分析社区获得建设性的反馈。当前版本是在由60多个图像分析专家的国际联盟中开发的,将在社区驱动的优化之后公开作为用户友好的工具包提供。
translated by 谷歌翻译
慢性疾病(例如多发性硬化症(MS))的精密医学涉及选择一种治疗方法,该治疗能够最好地平衡疗效和副作用/偏好。尽早做出这种选择很重要,因为寻找有效疗法的延迟可能会导致不可逆的残疾应计。为此,我们介绍了第一个针对MS患者的基线磁共振成像(MRI)(MRI)(MRI)(MRI)(MRI)的第一个深层神经网络模型。我们的模型(a)预测未来的新和扩大的T2加权(NE-T2)病变对多种治疗的随访MRI进行计数,并且(b)估计有条件的平均治疗效果(CATE),如预测的未来抑制NE所定义-t2病变,相对于安慰剂的不同治疗选择。我们的模型在四个多中心随机临床试验中从MS患者中获得的1817个多序列MRI的专有联合数据集进行了验证。我们的框架在未来NE-T2病变的二进制回归中达到了五种不同治疗的二进制回归,确定了异质治疗效果,并提供了个性化治疗建议,以说明治疗相关风险(例如,副作用,患者偏好,管理困难) 。
translated by 谷歌翻译
尽管自动图像分析的重要性不断增加,但最近的元研究揭示了有关算法验证的主要缺陷。性能指标对于使用的自动算法的有意义,客观和透明的性能评估和验证尤其是关键,但是在使用特定的指标进行给定的图像分析任务时,对实际陷阱的关注相对较少。这些通常与(1)无视固有的度量属性,例如在存在类不平衡或小目标结构的情况下的行为,(2)无视固有的数据集属性,例如测试的非独立性案例和(3)无视指标应反映的实际生物医学领域的兴趣。该动态文档的目的是说明图像分析领域通常应用的性能指标的重要局限性。在这种情况下,它重点介绍了可以用作图像级分类,语义分割,实例分割或对象检测任务的生物医学图像分析问题。当前版本是基于由全球60多家机构的国际图像分析专家进行的关于指标的Delphi流程。
translated by 谷歌翻译
This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants$\unicode{x2014}$what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read as a physics of intelligence, and which inherits from the physics of self-organization. In this context, we understand intelligence as the capacity to accumulate evidence for a generative model of one's sensed world$\unicode{x2014}$also known as self-evidencing. Formally, this corresponds to maximizing (Bayesian) model evidence, via belief updating over several scales: i.e., inference, learning, and model selection. Operationally, this self-evidencing can be realized via (variational) message passing or belief propagation on a factor graph. Crucially, active inference foregrounds an existential imperative of intelligent systems; namely, curiosity or the resolution of uncertainty. This same imperative underwrites belief sharing in ensembles of agents, in which certain aspects (i.e., factors) of each agent's generative world model provide a common ground or frame of reference. Active inference plays a foundational role in this ecology of belief sharing$\unicode{x2014}$leading to a formal account of collective intelligence that rests on shared narratives and goals. We also consider the kinds of communication protocols that must be developed to enable such an ecosystem of intelligences and motivate the development of a shared hyper-spatial modeling language and transaction protocol, as a first$\unicode{x2014}$and key$\unicode{x2014}$step towards such an ecology.
translated by 谷歌翻译
越来越多的研究致力于将机器学习方法应用于电子健康记录(EHR)数据,以完成各种临床任务。这一不断增长的研究领域暴露了所有人EHR数据集可访问性的局限性,以及不同建模框架的可重复性。这些局限性的原因之一是缺乏标准化的预处理管道。模仿是一种以许多研究中使用的原始格式免费获得的EHR数据集。缺乏标准化的预处理步骤是对数据集更广泛采用的重大障碍。它还导致在下游任务中使用不同的队列,从而限制了在类似研究中比较结果的能力。对比研究还使用各种不同的性能指标,可以大大降低比较模型结果的能力。在这项工作中,我们提供了一条端到端完全可定制的管道,以提取,清洁和预处理数据;并预测和评估ICU和非ICU相关临床时间序列预测任务的模拟数据集(MIMIC-IV)的第四版。该工具可在https://github.com/healthylaife/mimic-imic-iv-data-pipeline上公开获得。
translated by 谷歌翻译
大型语言模型已被证明可以使用少量学习来实现各种自然语言任务的出色表现,这大大减少了将模型调整到特定应用程序所需的特定任务培训示例的数量。为了进一步了解量表对少量学习的影响,我们培训了一个5400亿个参数,密集激活的变压器语言模型,我们称之为“途径”语言模型棕榈。我们使用Pathways在6144 TPU V4芯片上训练了Palm,这是一种新的ML系统,可在多个TPU POD上进行高效的训练。我们通过在数百种语言理解和产生基准的基准方面实现最先进的学习结果来证明扩展的持续好处。在这些任务中,Palm 540B实现了突破性的表现,在一系列多步推理任务上表现出色,超过了最新的最新表现,并且在最近发布的Big Benchmark上表现优于平均人类表现。大量的大型基础任务显示出与模型量表的不连续改进,这意味着当我们扩展到最大模型时,性能急剧增加。 Palm在多语言任务和源代码生成方面也具有很强的功能,我们在各种基准测试中证明了这一点。我们还提供了有关偏见和毒性的全面分析,并研究了训练数据记忆的程度,相对于模型量表。最后,我们讨论与大语言模型有关的道德考虑,并讨论潜在的缓解策略。
translated by 谷歌翻译